Hybrid Preemptive Scheduling of Message Passing Interface Applications on Grids

نویسندگان

  • Aurelien Bouteiller
  • Hinde-Lilia Bouziane
  • Thomas Hérault
  • Pierre Lemarinier
  • Franck Cappello
چکیده

Time sharing between cluster resources in a grid is a major issue in cluster and grid integration. Classical grid architecture involves a higher-level scheduler which submits non-overlapping jobs to the independent batch schedulers of each cluster of the grid. The sequentiality induced by this approach does not fit with the expected number of users and job heterogeneity of grids. Time sharing techniques address this issue by allowing simultaneous executions of many applications on the same resources. Co-scheduling and gang scheduling are the two best known techniques for time sharing cluster resources. Coscheduling relies on the operating system of each node to schedule the processes of every application. Gang scheduling ensures that the same application is scheduled on all nodes simultaneously. Previous work has proven that coscheduling techniques outperform gang scheduling when physical memory is not exhausted. In this paper, we introduce a new hybrid sharing technique providing checkpoint-based explicit memory management. It consists in co-scheduling parallel applications within a set, until the memory capacity of the node is reached, and using gang scheduling related techniques to switch from one set to another one. We compare experimentally the merits of the three solutions (co-scheduling, gang scheduling, and hybrid scheduling) in the context of out-of-core computing, which is likely to occur in the grid context, where many users share the same resources. Additionally, we address the problem of heterogeneous applications by comparing hybrid scheduling to an optimized version relying on paired scheduling. The experiments show that the hybrid solution is as efficient as the co-scheduling technique when the physical memory is not exhausted, can benefit from the paired scheduling optimization technique when applications are heterogeneous, and is more efficient than gang scheduling and co-scheduling when physical memory is exhausted.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MPI support on opportunistic grids based on the InteGrade middleware

The Message Passing Interface (MPI) is a popular programming model for parallel applications. Support for MPI in grid middleware is important for the widespread use of grids for parallel programming. This enables existing parallel applications to be executed on large-scale grids, as opposed to being restricted to local clusters. In the specific case of opportunistic grids, the use of idle compu...

متن کامل

Preemptive Task Execution and Scheduling of Parallel Programs in Message-Passing Systems

The scheduling of tasks within a parallel program onto the underlying available processors has been studied for some time. To date, solutions to this problem generally assume that communication only occurs at the start or end of each parallel task, i.e., the child task can only start its execution when all its parent tasks complete and have sent data to it. This is termed \non-preemptive task s...

متن کامل

MPC: A Unified Parallel Runtime for Clusters of NUMA Machines

Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, the architecture of cluster node is currently evolving from small symmetric shared memory multiprocessors towards massively multicore, Non-Uniform Memory Access (NUMA) hardware. Although regular MPI implementations ar...

متن کامل

Static/Dynamic Analyses for Validation and Improvements of Multi-Model HPC Applications. (Analyse statique/dynamique pour la validation et l'amélioration des applications parallèles multi-modèles)

Supercomputing plays an important role in several innovative fields, speeding up prototyping or validating scientific theories. However, supercomputers are evolving rapidly with now millions of processing units, posing the questions of their programmability. Despite the emergence of more widespread and functional parallel programming models, developing correct and effective parallel application...

متن کامل

Distributed Scheduling and monitoring service leveraging FaBRiQ as a building block for CloudKon+

In today’s world, the scientific community is moving towards distributed systems which plays an important role on achieving good performance and scalability. Task scheduling and execution over large scale, distributed systems plays an important role on achieving good performance and high system utilization[15]. Most of todays state-of-the-art job execution systems are centralized architectures,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJHPCA

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2006